Crystallographic and X-ray scattering study of RdfS, a recombination directionality factor from an integrative and conjugative element

The X-ray crystallographic structure of RdfS reveals molecular superhelical polymers in the crystal.


Introduction
Integrative and conjugative elements (ICEs) are chromosomally integrating mobile genetic elements that transfer between bacteria using conjugation. Prior to conjugation, ICEs must excise from the bacterial chromosome, a process facilitated by an ICE-encoded site-specific recombinase (also known as an integrase) and an additional protein called a recombination directionality factor (RDF, also know as excisionase; Groth & Calos, 2004;Lewis & Hatfull, 2001;Ramsay et al., 2006). The integrase binds regions of DNA called attachment sites (att sites), which contain a catalytic 'core' site where strand exchange occurs and conserved flanking regions called 'arm' or 'P' sites that orchestrate the structural organization of the nucleoprotein complex (Radman-Livaja et al., 2005). RDFs are often winged-helixturn-helix domain proteins that also bind DNA within att sites (Sam et al., 2004;Lewis & Hatfull, 2001). DNA binding by the RDF alters the recombinase-DNA nucleoprotein complex and often bends DNA to switch the favoured direction of recombination towards ICE excision.
ICEMlSym R7A is a 502 kb ICE encoded by Mesorhizobium japonicum R7A which confers on its host the ability to fix nitrogen and form symbiosis with leguminous plants of the Lotus genus (Sullivan & Ronson, 1998;Sullivan et al., 2002). The att sites attL and attR flank the ICE (Sullivan & Ronson, 1998). Integration requires the integrase, IntS, a tyrosine recombinase that belongs to the P4 integrase family (Sullivan & Ronson, 1998;Esposito & Scocca, 1997;Ramsay et al., 2006;Verdonk et al., 2019). When excised, the att sites are recombined, producing attP on the extrachromosomal circularized ICEMlSym R7A and attB within the bacterial chromosome. Excision of ICEMlSym R7A requires IntS and the RDF (also known as excisionase) RdfS, which when expressed stimulates ICEMlSym R7A excision and the concomitant formation of attP and attB (Ramsay et al., 2006). The expression of rdfS in ICEMlSym R7A is stimulated by quorum sensing (Ramsay et al., 2015. Secondary-structure prediction of RdfS proteins found in Mesorhizobium spp. suggest that they are members of the MerR superfamily of winged-helix-turn-helix (wHTH) DNAbinding proteins (Lewis & Hatfull, 2001;Haskett et al., 2018). Deletion of rdfS from the R7A chromosome abolishes the excision of ICEMlSym R7A , and overexpression of rdfS results in loss of ICEMlSym R7A from the cell. An intact rdfS is also required for conjugation (Ramsay et al., 2006;Verdonk et al., 2019). RdfS homologues on related ICEs also act as transcriptional activators (Haskett et al., 2016(Haskett et al., , 2017(Haskett et al., , 2018. Some RDFs, such as those of the cox family in phages, have also been shown to act as transcriptional regulators (Lewis & Hatfull, 2001;Lundqvist & Bertani, 1984;Dodd et al., 1990;Saha et al., 1987;Esposito & Scocca, 1997;Ahlgren-Berg et al., 2009). RdfS is highly conserved across diverse Mesorhizobium spp. that carry ICEs  and also among plasmids, suggesting additional roles of RDFs in transfer aside from those involved in recombination (Verdonk et al., 2019;Ramsay et al., 2006). There are currently no experimental structures of RdfS homologues (from mesorhizobia or otherwise) within the Protein Data Bank (PDB).
In this study, we cloned and overexpressed RdfS from M. japonicum R7A and determined its X-ray crystal structure to 2.45 Å resolution. We also demonstrate a solution scattering model of monomeric RdfS using small-angle X-ray scattering.

Expression and purification of RdfS
The rdfS gene (msi109) encoding the 89-residue protein (UniProt ID Q7AL96) was amplified from M. japonicum R7A (GenBank accession CP051772) genomic DNA using PCR (5 0 -ATATCCATGGACGACGAA AACGACCGC-3 0 and 5 0 -ATATGGATCCTTATCATGAGC GGGCTCCCTCG-3 0 ; NcoI and BamHI sites in bold). The PCR product was cloned into the NcoI/BamHI sites of the pETM11 expression vector (European Molecular Biology Laboratory; Dü mmler et al., 2005) using T4 DNA ligase (New England Biolabs; NEB) as per the manufacturer's instructions. Confirmation of the insert was performed using PCR and subsequent Sanger sequencing (Australian Genome Research Facility). The plasmid was transformed into electrocompetent Escherichia coli NiCo21(DE3) cells (NEB) using electroporation and was selected for on lysogeny broth (LB) agar medium containing kanamycin (50 mg ml À1 ). Single colonies were inoculated into 5 ml LB (50 mg ml À1 kanamycin) and incubated for 16 h at 310 K at 160 rev min À1 . The 5 ml culture was used to inoculate 1 l nonselective LB in a 5 l conical flask incubated at 310 K at 160 rev min À1 until an optical density (600 nm) of $0.5 was reached. The cells were then induced using isopropyl -d-1-thiogalactopyranoside at a final concentration of 0.1 mM and grown for an additional 16 h at 293 K with shaking at 160 rev min À1 . The cells were harvested at 4 C for 45 min at 20 000g. The cell pellets were resuspended in wash buffer [80 mM NaH 2 PO 4 , 500 mM NaCl, 80 mM imidazole, 5%(v/v) glycerol; pH 7.4] before being lysed using an Emulsiflex C5 high-pressure homogenizer (Avestin).
The lysate was centrifugated at 24 000g for 45 min at 4 C and the clarified soluble lysate was filtered using a 0.22 mm filter before being loaded onto a 5 ml HisTrap column (Cytiva) using an EP-1 peristaltic pump (Bio-Rad). Hexahistidinetagged RdfS (6H-RdfS) was eluted from the column using a linearly increasing concentration of elution buffer [80 mM NaH 2 PO 4 , 500 mM NaCl, 850 mM imidazole, 5%(v/v) glycerol; pH 7.4] across a total of ten column volumes (50 ml) on an Ä KTApure chromatography system (GE Healthcare).
Peak fractions were pooled, concentrated and stored at room temperature for immediate use or flash-frozen in liquid nitrogen for long-term storage at À80 C.

Crystallization
For crystallization, the protein tags were removed. Crystallization trials were explored using a variety of sparse-matrix screens: Index HT (Hampton Research), Crystal Screen HT (Hampton Research), JCSG-plus (Molecular Dimensions; Newman et al., 2005), the LMB Crystallization Screen (Molecular Dimensions) and ProPlex (Molecular Dimensions). Each screen was set up in sitting-drop vapour-diffusion format in a three-drop 96-well ARI LVR Intelli-Plate (Hampton Research) at 293 AE 0.5 K using an Art Robbins Phoenix robot. Each 300 nl drop consisted of protein solution (9 mg ml À1 ) and reservoir solution in a 1:1, 1:2 or 2:1 ratio equilibrated against 80 ml reservoir solution. While crystallization was observed in dozens of conditions across the entire plate, larger needle-shaped crystals that formed in condition No. 38 of the ProPlex screen [0.1 M MES pH 6.5, 10%(w/v) PEG 5000 MME, 12%(v/v) 1-propanol] appeared to be the most promising. Further optimization attempts varying the pH and the precipitant concentration in hanging drops in a 24-well VDX plate (Hampton Research) produced similar crystals with a larger size ($50 Â 1000 mm) which diffracted poorly (>10 Å ). One variation of the initial condition was selected from a modified protocol of the Additive Screen matrix (Hampton Research) set up across four hanging-drop 24-well VDX plates, with each drop consisting of 2.5 ml protein (8.8 mg ml À1 ), 2 ml crystallization condition [0.1 M MES pH 6.5, 8%(w/v) PEG 5000 MME, 10%(v/v) 1-propanol] and 0.5 ml additive condition equilibrated over a reservoir containing 100 ml of only the additive condition. This unusual setup was the result of an error which proved to be productive. In the presence of additive No. 23 (1 M sodium citrate tribasic dihydrate) RdfS formed large, multi-nuclear crystals of >1 mm in length. Replicate trials set up with identical conditions were conducted with four drops per well in a 24-well VDX hangingdrop tray (a total of 96 identical drops), with three individual drops producing plate-like rod crystal clusters [the final conditions in the drop were 0.05 M MES pH 6.5, 4%(w/v) PEG 5000 MME, 5%(v/v) 1-propanol, 0.1 M sodium citrate with RdfS protein at 4.3 mg ml À1 ]. Crystal clusters were harvested and transferred to a cryoprotectant condition [0.08 M MES pH 6.5, 6.4%(w/v) PEG 5000 MME, 8%(v/v) 1-propanol, 20%(v/v) ethylene glycol] for 2 min. Micro-Tools (Hampton Research) were used to separate individual crystals from the clusters before harvesting and flash-cooling in liquid nitrogen (Haas & Rossmann, 1970;Henderson, 1990). Diffraction experiments were carried out on the MX2 beamline at the Australian Synchrotron, Melbourne, Victoria, Australia (Aragã o et al., 2018) using remote access via the Blu-Ice software (McPhillips et al., 2002). The best data set was collected at 13.0 keV ( = 0.953 Å ) with a crystal-to-detector distance of 380 mm (2.49 Å at the detector top edge).

Results and discussion
3.1. Solution X-ray scattering of monomeric 6H-RdfS Recombinant 6H-RdfS could be expressed and purified with a high yield ($20 mg per litre of culture) suitable for biophysical analysis. Preliminary experiments were ambiguous with respect to the oligomerization state of RdfS in solution, so we used analytical SEC and SEC-SY-SAXS of 6H-RdfS to explore this in detail. 6H-RdfS elutes from the SEC column as a single peak with a long trailing edge and with an A 260 nm :A 280 nm ratio of 0.5, indicating pure protein (Fig. 1a). SAXS analysis of the peak yielded an experimental scattering curve exhibiting properties of a monodisperse sample (Fig. 1). Guinier analysis provided a radius of gyration (R g ) of 22 Å , and the pair-distribution function shows a maximal dimension (D max ) of approximately 92 Å , both of which are reasonable for a 6H-RdfS monomer with a prolate shape. SAXS-derived molecular-mass estimates of 12 046 Da (using a q max of 0.300 Å À1 and a V of 14 600 Å 3 ; Fischer et al., 2010) and 12 025 Da (Bayesian inference estimate; 96.22% credibility interval probability; Hajizadeh et al., 2018) also compare favourably with the expected molecular mass of 12 172 Da (as calculated by ProtParam; Wilkins et al., 1999). A dimensionless Kratky plot shows a peak slightly beyond 1.1 Å and a qR g of 1.7 Å À1 , suggesting that 6H-RdfS is elongated with signifi-cant flexibility, which is also supported by the increase in intensity at qR g > 5 indicating particle flexibility/disorder ( Fig. 1d; Durand et al., 2010;Bizien et al., 2016;Trewhella et al., 2017). Porod volume estimates of 230 Å suggest extremely large, elongated particles in solution. These data support the hypothesis that a substantial proportion of the 6H-RdfS monomer is disordered in solution, as it contains an N-terminal His 6 tag, a flexible linker and a TEV cleavage site, and the final 31 residues of the native protein are highly disordered as predicted by MobiDB-lite (Necci et al., 2021). Given the propensity of the 6H-RdfS molecules to be disordered in solution, models were generated to visualize flexibility using the X-ray structure as a template. A filtered scattering envelope shows an elongated (80 Å ) particle (Fig. 1e), which is slightly smaller than the idealized D max calculated using the pair-distribution function ( Supplementary Fig. S1). An ensemble of EOM models is shown in Fig. 1( f), highlighting the flexibility of the N-terminal region in solution with a reasonable fit ( 2 = 0.27; two degrees of freedom). Further models of the RdfS structure fitted into the scattering of 6H-RdfS, including the addition of dummy residues for the additional N-and C-terminal regions that are not represented in the X-ray structure, can be found in Supplementary Fig. S2.

Crystallization and X-ray data processing
RdfS crystallized readily across a broad range of screening conditions, with crystals consistently forming in approximately 17% of conditions in Index HT (Hampton Research), 10% of conditions in Crystal Screen HT (Hampton Research), 23% of conditions in JCSG-plus (Molecular Dimensions), 24% of conditions in the LMB Crystallization Screen (Molecular Dimensions) and 23% of conditions in ProPlex (Molecular Dimensions). Adjustments to the concentration of protein used and the volume of the drop appeared to have little impact on the number of crystals or of condition 'hits', but instead the volume ratio of protein sample (in SEC buffer) to crystallization condition used had the largest discernible impact on the size and number of crystals within a single drop. Crystalforming conditions containing either a 1:1 ratio of protein to crystallization condition (present in approximately 87% of all crystals formed) or a 1:2 ratio ($73%) yielded the most crystals (a 2:1 ratio only appeared in $43% of all crystalforming conditions).
Hundreds of crystals from a variety of conditions were tested for diffraction with no success, which was perhaps not surprising given their thin, uneven needle-like habit (Fig. 2a). Whilst these crystals varied in size, they typically formed very narrow (>10 mm width) highly elongated needles or prisms ( Supplementary Fig. S3). No improvement in diffraction was seen with the MX2 mini beam (25 Â 15 mm), compared with the larger MX1 beam (120 Â 120 mm), for crystals with similar morphology. Condition No. 38 [0.1 M MES pH 6.5, 10%(w/v) PEG 5000 MME, 12%(v/v) 1-propanol] from the ProPlex screen yielded the most promising large (>500 mm) multinuclear crystals, which showed evidence of diffraction to 10 Å resolution ( Supplementary Fig. S3). Optimization of the pH   Besl & McKay, 1992) metric was 14.7 (AE2.5) for all 11 DAMMIN models. ( f ) EOM ensemble of six RdfS cartoon models fitted into the 6H-RdfS SAXS scattering data, with the wHTH domain (black) rigid and the N-terminal helix (coloured) disordered. The various colours demonstrate the flexibility of the N-terminal helix as fitted to the scattering data (red, green, yellow, blue, purple and pink). and the PEG and propanol concentrations [to 0.1 M MES pH 6.5, 8%(w/v) PEG 5000 MME, 10%(v/v) 1-propanol] was used to reproducibly obtain larger crystals.
To improve the diffraction properties, we screened additives and pursued a strategy capitalizing on an error where we screened for crystallization against reservoirs that contained only concentrated additive solution. Serendipitously, we found that this produced multiple crystal-forming conditions in initial trials with Hampton Research Additive Screen. Importantly, one of the conditions yielded a single large multinuclear crystal (>1 mm; Fig. 2b) which diffracted to 4 Å resolution (data not shown). We replicated this experimental condition [0.05 M MES pH 6.5, 4%(w/v) PEG 5000 MME, 5%(v/v) 1-propanol, 0.1 M sodium citrate tribasic dehydrate with 4.3 mg ml À1 RdfS equilibrated against a reservoir consisting of 1 M sodium citrate tribasic dehydrate] 96 identical times, demonstrating low reproducibility (three successes out of 96 replicates; 3.1%), suggesting that variability in the process of setting up drops or uncontrolled nucleation events affected the outcome. Nevertheless, crystals from a single drop containing a typical large cluster of flat, rod-like crystals (Fig. 2c) were harvested, cryoprotected in mother liquor supplemented with 20% ethylene glycol and flash-cooled for storage. We suspect that the effectiveness and the overall increase in diffraction quality of the crystals produced from our additive trials are the result of a 'pseudo-salting-out' technique. We assume that because of a decrease in water and 1-propanol in the drop over time via vapour diffusion due to the additive-only reservoir solution, idealized conditions fortuitously enabled RdfS to form crystals which diffracted well in a small percentage of replicate conditions. RdfS crystals diffracted to 2.45 Å resolution on beamline MX2 at the Australian Synchrotron (Aragã o et al., 2018; Fig. 2d). Data analysis clearly indicated space group P2 1 2 1 2 1 , with unit-cell parameters a = 35.25, b = 119.20, c = 123.40 Å (Table 2) Crystallization and diffraction of RdfS. (a) An example crystal morphology from sparse-matrix crystallization screens, with a thin, rod-like crystal shape observed. (b) Initial additive trial multi-nuclear crystal as described in Section 3.2. (c) Final crystal morphology of the RdfS crystals used to generate the solved data set prior to separation with tools and cryoprotection. (d) Diffraction pattern from the first ten frames (1 ) of data with a resolution indicator (dashed circle). An image of a crystal from the solved data set mounted in a loop is shown in the bottom right corner. asymmetric unit to be most likely (46% probability), while a self-rotation function suggested that four or eight molecules per asymmetric unit were possible. With such high Z 0 values and no experimental model of a close homologue, molecular replacement was expected to be challenging.

Molecular replacement
Despite the lack of a close homologue to RdfS, the structure was solved by molecular replacement using MOLREP (Vagin & Teplyakov, 2010) from the CCP4 suite . In the absence of structures of close homologues, ab initio predictions from AlphaFold were used as search models both individually and as a cluster of five models (Fig. 3). Initial molecular-replacement attempts used fully intact models to maximize the scattering power of the model, but failed most probably due to structural variation and disorder at the termini of the models ( Fig. 3c and Supplementary Table S1), an observation that was supported by disorder analysis with DisEMBL (version 1.5; Linding et al., 2003) and the pLDDT plot of each model (Fig. 3a). Manual investigation of the five individual ab initio models showed variability of the orientation of the N-terminal -helix and the extended C-terminal tail, so a single truncated RdfS AlphaFold model (residues 16-64; RdfS 16-64 ) was utilized in all further molecular-replacement attempts (Fig. 3b). RdfS 16-64 yielded promising results, with three protomers located with strong rotation-function and translation-function scores. Refinement of this model stalled, so we manually inspected the structure and electron density in Coot, with two notable observations: (i) the three molecules were arranged in a repeating head-to-tail manner (i.e. the interface of A:B was replicated at B:C) and (ii) there was notable electron density adjacent to molecule C (Fig. 4a). Furthermore, structural alignment of the A:B:C 'trimer' on itself (with A superimposed on C; Fig. 4b) placed molecule B in unmodelled electron density and molecule C directly on top of molecule A 0 from an adjacent asymmetric unit. This was unlikely to be a coincidence and gave us confidence that the structure contains four molecules per asymmetric unit arranged in a head-to-tail fashion that continues throughout   Table S1). the crystal in one dimension, resulting in a protein polymer with eight molecules per helical turn. The refinement of four RdfS 16-64 molecules (Fig. 3d) proceeded as expected, with a substantial reduction of R free , and each copy was highly similar to the others in the asymmetric unit [average r.m.s.d. of 0.32 Å calculated using PyMOL (version 1.8; Schrö dinger) with default cycle parameters; Supplementary Table S2]. A finalized structure refined to an R work of 19.9% and an R free of 24.6% validated using MolProbity (Chen et al., 2010) has been deposited in the PDB (PDB entry 8dgl) and will be described in detail elsewhere.

Conclusion
The finalized RdfS monomer structure was determined to closely match a majority ($57%) of the ab initio model as predicted by AlphaFold; it contains a single winged-helixturn-helix domain, as is common among DNA-binding proteins (Aravind et al., 2005;Brennan, 1993), and two additional helices at the N-and C-termini. We note that the 21 C-terminal amino acids of the protein are completely disordered, as expected from their sequence PPEPGSDDDKGG SGSADEGARS. This represents 24% of the native protein sequence. Sequence comparisons reveal that this region of the protein is highly variable among RdfS homologues, likely due to the overlapping reading frames of rdfS and its downstream gene traF Haskett et al., 2016;Sullivan et al., 2002), which suggests that it may not be the protein sequence of this region which is the trait under selection. It is possible, however, given the variability among relatives of the RdfS protein, that this extreme C-terminal region may regulate currently undiscovered species-specific functions of RdfS in vivo, such as aggregation or other forms of protein interaction, although this has not been experimentally tested. It is likely that the highly flexible nature of this C-terminal region, as partially described by the 6H-RdfS SAXS data detailed in this manuscript (Fig. 1), may have interfered with crystallization and contributed to the poor crystal quality.
Subsequent to this work, best practices for using AlphaFold models for molecular replacement have been developed (McCoy et al., 2022). Analysis of the pLDDT confidence scores can be used to determine which residues to include from an ab initio model. Specifically, running phenix.process_ predicted_model (Liebschner et al., 2019) with the AlphaFold models of RdfS resulted in a trimmed model of residues 16-66, which closely matched our manually created model (Fig. 3b). When manipulating AlphaFold-sourced structures, care should be taken to introduce realistic B factors into models in place of pLDDT scores, as implemented in the MOLREP 'SURF Y' command (Vagin & Teplyakov, 2010).
One of the axial views of the crystal structure (Fig. 4c) reveals a highly porous crystal, which is commensurate with the observed solvent content of 71%. The porosity and limited contact between protomers may also explain the challenging crystallization and poor diffraction, as more loosely packed crystals typically diffract more weakly than more tightly packed crystals (Matthews, 1968(Matthews, , 1976Podjarny et al., 2002;Kantardjieff & Rupp, 2003).
RdfS proved a challenging protein to crystallize and solve the structure of, despite its relatively small size. It is likely that one cause of this difficulty is the observed propensity for the protein to form polymeric filaments. The biological significance of this quaternary structure has yet to be confirmed; however, the superficial similarity to the polymers formed by other structurally similar proteins such as BldC (Schumacher et al., 2018;Dorman et al., 2020) and Xis (Abbani et al., 2007) may suggest a role for cooperative DNA binding in the regulation of ICE excision, the activation of conjugative transfer and transcriptional regulation. Further investigations on the biological implications of the quaternary structure of RdfS, along with implicit discussions on protein-nucleic acid interactions, will be presented in the future.